Performance Analysis of Application Kernels in Multi/Many-Core Architectures

نویسندگان

Karthik Balasubramanian

Ioan Raicu

چکیده

In recent years, advancement in technology and computing led to huge amounts of data being generated. Thus, HighPerformance Computing (HPC) plays an ever growing role in processing these large datasets in a timely fashion. Our analysis consist of few important throughput computing app kernels which have high degree of parallelism and makes them excellent candidates for evaluation on high end multi-core CPUs, and manycore GPUs. In this work, we performed a performance comparison of important app kernels like Image Convolution, Histogram and Bilateral filtering in multi-core CPU, many-core NVIDIA GPUs in addition to comparing our research framework GPU enabled ManyTask Computing (GeMTC). GeMTC is an execution model and runtime system which enables NVIDIA GPUs to be programmed with many concurrent and independent tasks of potentially short or variable duration. In this work we provide a thorough performance analysis between CPU, CUDA, and the GeMTC framework. Through this we better understand the behavior of different applications that belong to the Many-Task Computing paradigm. The results show that the GeMTC framework shows promising results for Many-Task Computing workloads running on NVIDIA GPUs.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

AutoMatch: Automated Matching of Compute Kernels to Heterogeneous HPC Architectures

HPC systems contain a wide variety of heterogeneous computing resources, ranging from general-purpose CPUs to specialized accelerators. Porting sequential applications to such systems for achieving high performance requires significant software and hardware expertise as well as extensive manual analysis of both the target architectures and applications to decide the best performing architecture...

متن کامل

Efficient parallelization of the genetic algorithm solution of traveling salesman problem on multi-core and many-core systems

Efficient parallelization of genetic algorithms (GAs) on state-of-the-art multi-threading or many-threading platforms is a challenge due to the difficulty of schedulation of hardware resources regarding the concurrency of threads. In this paper, for resolving the problem, a novel method is proposed, which parallelizes the GA by designing three concurrent kernels, each of which running some depe...

متن کامل

A portable and high-performance matrix operations library for CPUs, GPUs and beyond

High-performance computing systems today include a variety of compute devices such as multi-core CPUs, GPUs and many-core accelerators. OpenCL allows programming different types of compute devices using a single API and kernel language. However, there is no standard matrix operations library in OpenCL for operations such as matrix multiplication that works well on a variety of hardware from mul...

متن کامل

Performance Analysis and Optimisation of the OP2 Framework on Many-core Architectures

This paper presents a benchmarking, performance analysis and optimisation study of the OP2 “active” library, which provides an abstraction framework for the parallel execution of unstructured mesh applications. OP2 aims to decouple the scientific specification of the application from its parallel implementation, and thereby achieve code longevity and near-optimal performance through re-targetin...

متن کامل

Performance Analysis and Optimization of the OP2 Framework on Many-Core Architectures

This paper presents a benchmarking, performance analysis and optimization study of the OP2 ‘active’ library, which provides an abstraction framework for the parallel execution of unstructured mesh applications. OP2 aims to decouple the scientific specification of the application from its parallel implementation, and thereby achieve code longevity and near-optimal performance through re-targetin...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

Performance Analysis of Application Kernels in Multi/Many-Core Architectures

نویسندگان

چکیده

منابع مشابه

AutoMatch: Automated Matching of Compute Kernels to Heterogeneous HPC Architectures

Efficient parallelization of the genetic algorithm solution of traveling salesman problem on multi-core and many-core systems

A portable and high-performance matrix operations library for CPUs, GPUs and beyond

Performance Analysis and Optimisation of the OP2 Framework on Many-core Architectures

Performance Analysis and Optimization of the OP2 Framework on Many-Core Architectures

عنوان ژورنال:

اشتراک گذاری